Identifying duplicate content using statistically improbable phrases
نویسندگان
چکیده
منابع مشابه
Identifying duplicate content using statistically improbable phrases
MOTIVATION Document similarity metrics such as PubMed's 'Find related articles' feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of...
متن کاملIdentifying Generic Noun Phrases
This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rulelike knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level i...
متن کاملStatistically identifying basic color terms
Basic color terms were originally defined by Berlin and Kay based on linguistic definitions and refer to a subset of color expressions that are universal across languages. In this paper, we investigate and report that basic color terms demonstrate much stronger statistical characteristics that differentiate them from other color words. We introduce a probabilistic interpretation of color naming...
متن کاملIdentifying synonymy between relational phrases using word embeddings
Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual stri...
متن کاملInvited commentary: Identifying the improbable, the value of incremental insights.
There has been a long-standing debate about whether birth weight directly affects adult blood pressure, or whether the association is entirely mediated through current weight. In this issue of the Journal, Chiolero et al. (Am J Epidemiol. 2014;179(1):4-11) quantitatively evaluate whether bias from an unmeasured confounder of the relationship between current weight and current blood pressure cou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2010
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btq146